An Architecture For A Universal Lexicon: A Case Study on Shared Syntactic Information in Japanese, Hindi, Bengali, Greek, and English
نویسندگان
چکیده
Naoyuki Nomura, Douglas A. Jones, Robert C. Berwick Massachusetts Institute of Technology, NE43-802 Artificial Intelligence Laboratory [email protected] Introduction. Given the prominence of the lexicon in most current linguistic theories (LFG, HPSG, GB), the inventory of language particular information left in the lexicon deserves special attention. Constructing large computerized lexicons remains a difficult problem, building a large array of apparently arbitrary information. This paper shows that this arbitrariness can be constrained more than might have been previously thought. In particular, arbitrariness of argument structure, word sense, and paraphrasability will be shown not only to be constrained, but also to be integrally related. Our (radical) view is that variation of lexical behavior across languages is exactly like lexical variation within languages, specifically, the difference lies in the presence or absence of certain morphemes. For example, the fact that Japanese has richer possibilities in certain verbal patterns is derived solely from its morphological inventory.1 Put another way, language parameters simply are the presence or absence of lexical material in the morphological component. Observed language variation patterns reflect morphological systematicity. The generative machinery for producing argument structure positions is fixed across languages. Linguistic Motivation. A striking example underscoring universality of argument structure is the familiar Spray/Load alternation2, shown in (1). Despite the many surface differences in these data across languages, they share several essential properties. (1) a. b. John loaded the hay on the wagon. John loaded the wagon with the hay. Japanese (2) a. taroo-wa teepu-o boo-ni maita. Taro-NOM tape-ACC stick-DAT wrap-PRF 'Taro wrapped the tape around the stick.'
منابع مشابه
Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...
متن کاملBengali and Hindi to English CLIR Evaluation
Our participation in CLEF 2007 consisted of two Cross-lingual and one monolingual text retrieval in the Ad-hoc bilingual track. The cross-language task includes the retrieval of English documents in response to queries in two Indian languages, Hindi and Bengali. The Hindi and Bengali queries were first processed using a morphological analyzer (Bengali), a stemmer (Hindi) and a set of 200 Hindi ...
متن کاملAutomatic Generation of Multilingual Lexicon by Using Wordnet
A lexicon is the heart of any language processing system. Accurate words with grammatical and semantic attributes are essential or highly desirable for any applicationbe it machine translation, information extraction, various forms of tagging or text mining. However, good quality lexicons are difficult to construct requiring enormous amount of time and manpower. In this paper, we present a meth...
متن کاملChickpea (Cicer arietinum) | Feedipedia
Chickpea, chick pea, Egyptian bean, gram pea, Bengal gram [English]; garbanzo [Spanish]; pois chiche [French]; grão-de-bico, ervilha-de-bengala [Portuguese]; kikkererwt [Dutch]; Kichererbse [German]; kacang arab [Indonesian]; cece [Italian]; nohut [Turkish]; Đậu gà [Vietnamese]; [Amharic]; صمّـحِلا [Arabic]; [Bengali]; 鹰嘴豆 [Chinese]; دوخن [Farsi]; Ρεβιθιά [Greek]; [Gujarati]; הצמח [Hebrew]; [Hind...
متن کاملIIT-TUDA: System for Sentiment Analysis in Indian Languages Using Lexical Acquisition
Social networking platforms such as Facebook and Twitter have become a very popular communication tools among online users to share and express opinions and sentiment about the surrounding world. The availability of such opinionated text content has drawn much attention in the field of Natural Language Processing. Compared to other languages, such as English, little work has been done for India...
متن کامل